Relational Learning of Pattern-Match Rules for Information Extraction

نویسندگان

  • Mary Elaine Califf
  • Raymond J. Mooney
چکیده

Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing. This paper presents a system, RAPIER, that takes pairs of documents and filled templates and induces pattern-match rules that directly extract fillers for the slots in the template. The learning algorithm incorporates techniques from several inductive logic programming systems and learns unbounded patterns that include constraints on the words and part-of-speech tags surrounding the filler. Encouraging results are presented on learning to extract information from computer job postings from the newsgroup misc. jobs. offered.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction

Information extraction is a form of shallow text processing that locates a specified set of relevant items in a natural-language document. Systems for this task require significant domain-specific knowledge and are time-consuming and difficult to build by hand, making them a good application for machine learning. We present an algorithm, RAPIER, that uses pairs of sample documents and filled te...

متن کامل

Learning Relational Structure for Temporal Relation Extraction

Recently there has been a lot of interest in using Statistical Relational Learning (SRL) models for Information Extraction (IE). One of the important IE tasks is extraction of temporal relations between events and time expressions (timex). SRL methods that use hand-written rules have been proposed for various IE tasks. In contrast, we propose an approach that employs structure learning in SRL t...

متن کامل

Learning Relational Features with Backward Random Walks

The path ranking algorithm (PRA) has been recently proposed to address relational classification and retrieval tasks at large scale. We describe Cor-PRA, an enhanced system that can model a larger space of relational rules, including longer relational rules and a class of first order rules with constants, while maintaining scalability. We describe and test faster algorithms for searching for th...

متن کامل

Towards First-Order Random Walk Inference

Path Ranking Algorithm (PRA) addresses classification and retrieval tasks using learned combinations of labeled paths through a graph. Unlike most Statistical Relational Learning (SRL) methods, PRA scales to large data sets but uses a limited set of paths in its models—ones that correspond to short first order rules with no constants. We consider extending PRA in two ways—learning paths that co...

متن کامل

Information Extraction from Patients' Free Form Documentation

The paper presents two rule-based information extraction (IE) from two types of patients’ documentation in Polish. For both document types, values of sets of attributes were assigned using specially designed grammars. 1 Method/General Assumptions Various rule-based, statistical, and machine learning methods have been developed for the purpose of information extraction. Unfortunately, they have ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997